Adaptation Using Out-of-Domain Corpus within EBMT
نویسندگان
چکیده
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an indomain monolingual corpus. We conducted experiments with an EBMT system. The two evaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model.
منابع مشابه
Building Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System
We describe in this paper a hybrid approach to build automatically bilingual lexicons of Multiword Expressions (MWEs) from parallel corpora. We more specifically investigate the impact of using a domain-specific bilingual lexicon of MWEs on domain adaptation of an Example-Based Machine Translation (EBMT) system. We conducted experiments on the English-French language pair and two kinds of texts...
متن کاملThe influence of example-data homogeneity on EBMT quality
Homogeneity of large corpora is still a largely unclear notion. In this study we first make a link between the notions of similarity and homogeneity : a large corpus is made of sets of documents to which may be assigned a score in similarity defined by cross-entropic measures, such similarity being implicitly expressed in the data. The distribution of the similarity scores of such subcorpora ma...
متن کاملDeveloping Language Resources for a Transnational Digital Government System
We describe ongoing efforts towards developing language resources for a transnational digital government project aimed at applying information technology (IT) to a problem of international concern: detecting and monitoring activities related to the transnational movement of illicit drugs. The project seeks to support information sharing, coordination and collaboration among government agencies ...
متن کاملNUS at WMT09: Domain Adaptation Experiments for English-Spanish Machine Translation of News Commentary Text
We describe the system developed by the team of the National University of Singapore for English to Spanish machine translation of News Commentary text for the WMT09 Shared Translation Task. Our approach is based on domain adaptation, combining a small in-domain News Commentary bi-text and a large out-of-domain one from the Europarl corpus, from which we built and combined two separate phrase t...
متن کاملExample-Based Machine Translation for Low-Resource Language Using Chunk-String Templates
Example-Based Machine Translation (EBMT) for low resource language, like Bengali, has low-coverage issues, due to the lack of parallel corpus. In this paper, we propose an EBMT for low resource language, using chunk-string templates (CSTs) and translating unknown words. CSTs consist of a chunk in source-language, a string in target-language, and word alignment information. CSTs are prepared aut...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003